arXiv:1506.03797vl [cs.CG] llJun2015 


A Geometric Perspective on Sparse Filtrations* 


Nicholas J. Cavannal^ Mahmoodreza Jahanseirl- 
Donald R. Sheehy^ 


Abstract 

We present a geometric perspective on sparse filtrations used in topo¬ 
logical data analysis. This new perspective leads to much simpler proofs, 
while also being more general, applying equally to Rips hltrations and 
Cech filtrations for any convex metric. We also give an algorithm for find¬ 
ing the simplices in such a hltration and prove that the vertex removal 
can be implemented as a sequence of elementary edge collapses. 

A video illustrating this approach is available [7] as well as a short 
conference version [6]. 

1 Introduction 

Given a finite data set in a Euclidean space, it is natural to consider the balls 
around the data points as a way to fill in the space around the data and give 
an estimate of the missing data. The union of balls is often called the offsets 
of the point set. Persistent homology was originally invented as a way to study 
the changes in topology of the offsets of a point set as the radius increases 
from 0 to oo. The input to persistent homology is usually a filtered simplicial 
complex, that is, an ordered collection of simplices (vertices, edges, triangles, 
etc.) such that each simplex appears only after its boundary simplices of one 
dimension lower. The Nerve Theorem and its persistent variant allow one to 
compute the persistent homology of the offsets by instead looking at a discrete 
object, a filtered simplicial complex called the nerve (see Fig. [^. The simplest 
version of this complex is called the Cech complex and it may be viewed as the 
set of all subsets of the input, ordered by the radius of their smallest enclosing 
ball. Naturally, the Cech complex gets very big very fast, even when restricting 
to subsets of constant size. A common alternative is the Rips complex but 
it suffers similar difficulties. Over the last few years, there have been several 
approaches to building sparser complexes that still give good approximations to 
the persistent homology HU [TU [TU El E] • 
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Figure 1: A point set sampled on a sphere, its offsets, and its (sparsified) nerve 
complex. 


Our main contributions are the following. 

1. A much simpler explanation for the construction and proof of correctness 
of sparse filtrations. Our new geometric construction shows that the sparse 
complex is just a nerve in one dimension higher. 

2. The approach easily generalizes to Rips, Cech and related complexes (the 
offsets for any convex metric). This is another advantage of the geomet¬ 
ric view as the main result follows from convexity rather than explicit 
construction of simplicial map homotopy equivalences. 

3. A simple geometric proof that the explicit removal of vertices from the 
sparse filtration can be done with simple edge contractions. This can be 
done without resorting to the full-fledged zig-zag persistence algorithm [5l 
HIIIIIS] or even the full simplicial map persistence algorithm dill]- 

The most striking thing about this paper is perhaps more in what is absent 
than what is present. Despite giving a complete treatment of the construction, 
correctness, and approximation guarantees of sparse filtrations that applies to 
both Cech and Rips complexes, there is no elaborate construction of simplicial 
maps or proofs that they induce homotopy equivalences. In fact, we prove the 
results directly on the geometric objects, the covers, rather than the combina¬ 
torial objects, the complexes, and the result is much more direct. In a way, 
this reverses a common approach in computational geometry problems in which 
the geometry is as quickly as possible replaced with combinatorial structure; in¬ 
stead, we delay the transition from the offsets to a discrete representation until 
the very end of the analysis. 

Related Work. Soon after the introduction of persistent homology by Edels- 
brunner et al. m, there was interest in building more elaborate complexes for 
larger and larger data sets. Following the full algebraic characterization of per¬ 
sistent homology by Zomorodian and Carlsson a more general theory of 
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zigzag persistence was developed [51I1II1IIS] using a more complicated algo¬ 
rithm. Zig-zags gave a way to analyze spaces that did not grow monotonically; 
they could alternately grow and shrink such as by growing the scale and then 
removing points | 22 j . A variant of this techniques was first applied for specific 
scales by Chazal and Oudot in work on manifold reconstruction and was im¬ 
plemented as a full zigzag by Morozov in his Dionysus library |12j . Later, Sheehy 
gave a zig-zag for Rips filtrations that came with guaranteed approximation to 
the persistent homology of the unsparsified filtration [2T]. Other later works 
gave various improvements and generalizations of sparse zig-zags [201 [m HUE]. 

2 Background 

Distances and Metrics. Throughout, we will assume the input is a finite 
point set P in endowed with some convex metric d. A closed ball with 
center c and radius r will be written as ball(c, r) = {x G K‘^|d(x, c) < r}. For 
illustrative purposes, we will often draw balls as Euclidean {£ 2 ) balls. 

For a non-negative a G M, the a-offsets of P are defined as 

P“ := y ball(p,a). 
peP 

The sequence of offsets as a ranges from 0 to 00 is called the offsets filtration 

{P'^h 

The doubling dimension of a metric space is log 2 7 , where 7 is the maximum 
over all balls R, of the minimum number of balls of half the radius of B required 
to cover B. Metric spaces with a small constant doubling dimension are called 
doubling metrics. Such metrics allow for packing arguments similar to those used 
in Euclidean geometry. For example, consider the following simple exercise. If 
a set of points in a metric of doubling dimension p are pairwise of distance at 
least e apart and all contained in a ball of radius ce, then there are fewer than 
( 2 c)^ points. 

Simplicial Complexes. A simplicial complex AT is a family of subsets of 
a vertex set that is closed under taking subsets. The sets tr G AT are called 
simplices and |cr| — 1 is called the dimension of cr. A nested family of simplicial 
complexes is called a simplicial filtration. Often the family of complexes will be 
parameterized by a nonnegative real number as in {Ar“}Q,>o. Here, the filtration 
property guarantees that a < j3 implies that AT" C . In this case, the value 
of a for which a simplex first appears is called its birth time, and so, if there is 
a largest complex A'“ in the filtration, the whole filtration can be represented 
by Ar“ and the birth time of each simplex. For this reason, simplicial filtrations 
are often called filtered simplicial complex. 

Persistent Homology. Homology is an algebraic tool for characterizing the 
connectivity of a space. It captures information about the connected compo¬ 
nents, holes, and voids. For this paper, we will only consider homology with 
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field coefficients and the computations will all be on simplicial complexes. In 
this setting, computing homology is done by reducing a matrix D called the 
boundary matrix of the simplicial complex. The boundary matrix has one row 
and column for each simplex. If the matrix reduction respects the order of a 
filtration, i.e. columns are only combined with columns to their left, then the re¬ 
duced matrix also represents the so-called persistent homology of the hltration. 
Persistent homology describes the changes in the homology as the filtration pa¬ 
rameter changes and this information is often expressed in a barcode (See Fig.j^. 
Barcodes give topological signatures of a shape [14] . 



Figure 2: A filtration and its barcode. 

Each bar of a barcode is an interval encoding the lifespan of a topological 
feature in the filtration. We say that a barcode Bi is a (multiplicative) c- 
approximation to another barcode B 2 if there is a partial matching between Bi 
and B 2 such that every bar [b,d] with d/6 > c is matched and every matched 
pair of bars [b,d],[b',d'] satisfies max{b/b',b'/b,d/d',d'/d} < c. A standard 
result on the stability of barcodes [S] implies that if two hltrations {E"} and 
{G“} are c-interleaved in the sense that C C F™, then the barcode of 
{F“} is a c-approximation to {G“}. 

Nerve Complexes and Filtrations. Let U = {Ui ,..., 17„} be a collection 
of closed, convex sets. Let IJG denote the union of the sets in U, i.e. IJG := 
Ur=i that the set U is a cover of the space IJG. The nerve of U, 

denoted Nrv(G) is the abstract simplicial complex defined as 

Nrv(G) := I / C [n] I f| G, ^ 0 
I i&I 

This construction is illustrated in Figj^ The Nerve Theorem [151 Cor. 4G.3] 
implies that Nrv(G) is homotopy equivalent to |JG. 

Similarly, one can construct a nerve filtration from a cover of a filtration 
by filtrations. Specihcally, let U = {{G“},... {G“}} be a collection of filtra¬ 
tions parameterized by real numbers such that for each i € [n] and each a > 0, 


4 











Figure 3: The nerve has an edge for each pairwise intersection, a triangle for 
each 3-way intersection (right), etc. 

the set [/“ is closed and convex. As shorthand, we write U°^ to denote the 
set {t/f,..., 17“}. As before, the Nerve Theorem implies that 1JZ7“ is homo- 
topy equivalent to Nrv(7/“). The Persistent Nerve Lemma [9] implies that the 
filtrations {lJ^“}a>o and {Nrv(7/“)}Q,>o have identical persistent homology. 

Cech and Rips Filtrations. A common filtered nerve is the Cech filtration. 
It is defined as {Cq,(P)}, where 

Ca{P) := Nrv{ball(pi, a) | i G [n]}. 

Notice that this is just the nerve of the cover of the a-offsets by the a-radius 
balls. Thus, the Persistent Nerve Lemma implies that {P“} and {Ca{P)} have 
identical persistence barcodes. 

A similar filtration that is defined for any metric is called the (Vietoris-)Rips 
filtration and is defined as {72.a(P)}, where 

TZaiP) ■= {J C [n] I maxd(pi,m) < 2a}. 
i,jeJ 

Note that if d is the max-norm, i^o, then TZa{P) — Ca{P). Moreover, because 
every finite metric can be isometrically embedded into £oo) every Rips filtration 
is isomorphic to a nerve filtration. 

Greedy Permutations. Let P be a set of points in some metric space with 
distance d. A greedy permutation of P goes by many names, including land¬ 
mark sets, farthest point sampling, and discrete center sets. We say that 
P = {pi,... ,pn} is ordered according to a greedy permutation if each pi is 
the farthest point from the first i — 1 points. We let pi be any point. Formally, 
let Pi = {pi,... ,pi} be the fth prefix. Then, the ordering is greedy if and only 
if for alH G {2 ,..., nj, 

d(pi,Pi_i) = maxd(p, Pi_i). 

pGP 

For each point pi, the value Xi d(pi,Pi_i) is known as the insertion radius. 
By convention, we set Ai = oo. It is well-known (and easy to check) that Pi is 
a Ai-net in the sense that it satisfies the conditions: for all distinct p,q G Pi, 
d(p, ?) > Ai (packing) and P C P^' (covering). 
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Figure 4: Left: two growing balls trace out cones in one dimension higher. 
Center: One of the cones has a maximum radius. Right: Limiting the height of 
one cone guarantees that the top is covered. 

3 Perturbed Distances 

A convenient first step in making a sparse version of the Cech hltration is to 
“perturb” the distance. Given a greedy permutation, we perturb the distance 
function so that as the radius increases, only a sparse subset of points continues 
to contribute to the offsets. This can most easily be viewed as changing the 
radius of the balls slightly so that some balls will be completely covered by their 
neighbors and thus will not contribute to the union. Fix a constant e < 1 that 
will control the sparsity. As we will show in Lemma at scale a, there is an 
ea-net of P whose perturbed offsets cover the perturbed offsets of P. Assuming 
the points P = {pi ,... ,p„} are ordered by a greedy permutation with insertion 
radii Ai,..., A„, we define the radius of pi at scale a as 



a if a < Ai(l + e)/£ 

Ai(l+e)/e otherwise. 


The perturbed a-ojfsets are defined as 


:= (J hal\{pi,ri{a)). 


To realize the sparsihcation as described, we want to remove balls associated 
with some of the points as the scale increases. This is realized by defining the 
a-ball for a point pi G P to be 




otherwise. 


The usefulness of this perturbation is captured by the following covering 
lemma, which is depicted in the tops of the cones in Fig. 

Lemma 1 (Covering Lemma). Let P = {pi,... ,Pn} be a set of points ordered 
by a greedy permutation with insertion radii Ai,..., A„. For any a, /3 > 0, and 
any pj G P, there exists a point Pi G P such that 

L if /3 > a then bj{a) C bi{/3), and 
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if ^ > (1 + £)a, then ball(pj,a) C bi{l3). 

Proof. Fix any pj G P. We may assume that /3 > Aj(l + £)^/£, for otherwise, 
choosing pi = pj suffices to satisfy both clauses, the first because bj{a) C 
bj{l3) and the second because hd\\{pj,a) = bj{a) C bj{l3). This assumption is 
equivalent to the assumption that bj{f3) =0. 

By the covering property of the greedy permutation, there is a point pi G P 
such that d{pi,pj) < £/3/(l + £) and Xi > £/3/(l + e). It follows that ri(/3) = /? 
and bi{fi) = ball(pi,/3). Recall that Ai = oo by convention, so 6i(/3) ^ 0, and 
for large values of /3, choosing pi = pi suffices. 

To prove the first clause, fix any point x € bj(a). By the triangle inequality, 

d(x,pi) < d(x,pj) + d(pi,pj) < rj(a) +£/3/(l + e) 

— + £/^/(l + £) ^ P = (/3)- 

So, X G bi{/3) and thus, bj{a) C 5i(/3) as desired. 

To prove the second clause of the lemma, fix any x G hal\{pj,a). By the 
triangle inequality, 

d{x,pi) < d{x,pj) +d{pi,pj) < Q! + £/3/(1 + £) 

< /3/(l + £) + £/?/(1 + e) = r^{/3). 

So, as before, x G bi{p) and thus, haW{pj,a) C bi[(}) as desired. □ 

Corollary 2. Let P = {pi,... ,p„} be a set of points ordered by a greedy per¬ 
mutation with insertion radii Ai,...,A„. For all a > 0, P“ = [j^bi(a) and 

pa (— pa {— p(l+E)a 

Proof. We will first show that P“ = IJ^ hi{a). 

Fix any a > 0. For all j G [n], bj{a) C ha\\{pj,rj{a)), so 

U bjia) C IJ ball(p^-,rj(a)) = P“. (1) 

f6["] i6["] 

To show that P = ha\\{pj,rj{a)) C [J bj{a), we have two cases. If a < 

3 6 [n] 

Xj{i+e) ^ hj{a) = ha\\{pjPj{a)). Else a > which implies that 

rj[a) = Lg^ ^ = Tjia), which implies rj(j) = 7 and a > (I + £) 7 , 

so there exists i such that ball(pj, 7 ) C bi{a) and equivalently ha\\(j)j,rj{a)) C 
bi{a). Thus, 

P= U ball(pj,rj(a)) C |J bj{a). ( 2 ) 

je[n] jG[n] 

So Q and ([^ imply that P“ = IJ^ bi{a). 

Now, we will prove that P“ C P“ C p(i+e)“. 

P= U ball(p^-,rj(a)) C |J ball(pj,a) = P“, (3) 

je[n] jG[n] 
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because rj{a) < a. Let (3 = (1 + e)a, then for all j G [n] there exists i such that 
hal\{pj,a) C bi{P) by statement 2 in Lemma implying 

P“ = (J ball(pj,a) C y b0) = P'5 = p(i+")“. (4) 

je[n] je[n] 

Thus ( |3| ) and @ imply that P“ C P“ C p(i+^)“ □ 

Corollary implies the following proposition using standard results on the 
stability of persistence barcodes [5]. 

Proposition 3. The persistence barcode of the perturbed offsets {P“}q>o is a 
(1 + e)-approximation to the persistence barcode of the offsets {P“}a>o- 

4 Sparse Filtrations 

The sparse Cech complex is defined as Q" := Nrv{6i(a) | i G [n]}. Notice that 
because bi{a) = 0 unless Xi is sufhciently large compared to a, there are fewer 
vertices as the scale increases. This is the desired sparsihcation. Unfortunately, 
it means that the set of complexes {<5“} is not a filtration, but this is easily 
remedied by the following definition. The sparse Cech filtration is defined as 
{S'"}, where 

S“ := U = U Nrv{&,((5) | * e [n]}. 

S<oc (5<a 

This definition makes it clear that the sparse complex is a union of nerves, 
but it not obvious that it has the same persistent homology as the filtration 
defined by the perturbed offsets P“ := IJ- bi{a). For such a statement, it would 
be much more convenient if {S“} was itself a nerve filtration rather than a union 
of nerves, in which case the Persistent Nerve Lemma could be applied directly. 
In fact, this can be done by adding an extra dimension corresponding to the 
filtration parameter extending the balls bi(a) into the perturbed cone shapes 

ur-= [j{b^{s)x{s}). 

S<a 

These sets, depicted in Figs. and allow the following equivalent definition 
of the complexes in the sparse Cech hltration. 

S'" :=Nrv{P“ \ iG [n]} . 

Proposition 4. If d is a convex metric and ri is a concave function then 
Ct ■= convex. 

Proof. Given two points (a, da), (b,Si,) G U", d(a,Pi) < ri{da) and likewise 
d(6,Pi) < ri{db) by definition of r^. Let c = {l—t)a+tb and let Sc = {l—t)Sa+tSb, 


for t G [0,1]. Now we bound d{c,pi) as follows. 

d(c,Pi) < (1 — f)d(a,Pi) +td{b,pi) [d is convex] 

< (1 - t)ri{5a) + tri{5b) 

< ri{Sc) [ri is concave] 

Thus we can conclude that (c, (5c), a convex combination of arbitrary (a, 5a) and 
{b,Sb), is in 17“ and 17“ is convex. □ 

Theorem 5. The persistence barcode of the sparse nerve filtration {5'“}c[>o is 
a (1 + e)-approximation to the persistence barcode of the offsets {7’“}q>o- 

Proof. For all i, the set 17“ is convex because is concave by Proposition It 
follows that the sets 17“ satisfy the conditions of the Persistent Nerve Lemma. 
So, {5'“} has the same persistence barcode as the filtration {i?“}, where B“ := 



Figure 5: The collection of cones 5“ at two different scales. The top of the 
cones is the union of (perturbed) balls. 

The Covering Lemma implies that the linear projection of i3“ to P“ that 
maps {x,5) to x is a homotopy equivalence as each fiber is simply connected. 
Moreover, the projection clearly commutes with the inclusions i?“ and 

pa ^ pP ^ from which, it follows that Pers{P“} = Pers{P“} = PersIP"}. So, 
the claim now follows from Proposition □ 


5 Algorithms 

In previous work, it was shown how to use metric data structures m to compute 
the sparse Rips filtration in 0(n log n) time |21j when the doubling dimension 
is constant. The same approach also works for the sparse nerve filtrations de¬ 
scribed here. However, it depends on the construction of a net-tree ng, which 
is an intricate data structure. 

In this section, we present a simpler technique to construct a sparse nerve 
filtration from a greedy permutation of a finite metric (P, d). Throughout, we 
assume that the doubling dimension of (P, d) is constant. We show how to 
construct a sparse nerve filtration in linear time from the greedy permutation. 
Our approach starts with finding all edges and their birth times. 
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Let G be a directed graph whose vertices are the points of P and whose edges 
are the edges of the sparse nerve filtration of P directed from smaller to larger 
insertion radius. In Section |5.11 it is shown that for each directed edge {pi,pj) 


in G, A{pi,pj) < nXi, for a constant k. This reduces the problem of finding the 
edges of the filtration to the problem of finding points in a given neighborhood. 
Moreover, we show that the out-degree of a vertex in G is constant. Then, in 
Section 5.2 we present an algorithm to construct G from the greedy permutation 
and show that it runs in linear time. Finally, in Section [5.3[ we give an algorithm 
for building higher dimensional simplices using the directed graph and bound 
its running time. 


5.1 Finding Neighborhoods Suffices 

The vertices adjacent to pi in the directed graph G are the points pj with 
insertion radius at least that of pi such that their corresponding balls intersect 
at some scale a. The following lemma shows that these points have distance at 
most a constant times Xi to pi. Then, Lemma will use this fact to show that 
the number of adjacent vertices is at most a constant. 

Lemma 6. For a given point Pi with insertion radius Xi in the directed graph 
G, all adjacent points to pi are located in a ball(pi, nXi), where k = ^ and 

e > 0. 

Proof. In the directed graph G, a vertex pj is adjacent to vertex pi if Xi < Xj and 
for some scale a, bi{a) n bj{a) 0. These balls intersect before pi disappears, 
so 

bi{Xi{l + e)^/e) n bj (Ai(l + e)'^/e) ib. 

The distance between pi and pj is bounded as follows. 

d{p^,Pj) < r,{X^{l + e)^/e) + rj{Xi{l + 

< Ai(l -l- e)/e + Ai(l -|- s)'^fs 
-I- 3e -I- 2 

_ • 

£ 

Thus, all adjacent vertices to pi lie in a ball with center pi and radius kA^. □ 

Lemma 7. For a point set P ordered by a greedy permutation and with doubling 
dimension p, each pi € P has neighbors in the directed graph G, where 

K = and e > 0. 

Proof. The proof uses a standard packing argument. By the definition of the 
doubling dimension, ball(pi, nXi) can be covered by 2 ^(rig ^^1 +i) balls of radius less 
than Xi. Since the neighbors are pairwise A^-separated, there can be at most one 
point in each such ball. Therefore, the number of balls is . □ 
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5.2 How to find neighborhoods using a greedy permuta¬ 
tion 

In this section, we construct the directed graph G as described from a given 
greedy permutation. In Section |5.1[ it was shown that to construct G it suffices 
to find points within a metric ball around each point. We build an efficient data 
structure to maintain these points. 

Let P — {pi ,... ,pn} be the points in (P, d) ordered according to a greedy 
permutation. For each pi G P, let pred(pi) G Pi-i denote the nearest point to 
Pi among the first i — 1 points in the ordering. So, the insertion radius of pi is 
Xi = d(pj, pred(pd)- The level of Pi is defined as £i := [IgA^]. 

The goal is to process the points one at a time in the greedy ordering, and for 
each Pi, to find all preceding points within distance KXi, where k = (e^+3£+2)/e 
and £ > 0 is a fixed constant chosen by the user. Because all neighbors of pi in 
a sparse nerve filtration have this property by Lemma we can use this list to 
find all the neighbors. 

We will define a data structure V used to extract neighborhood information 
in the directed graph G. For each point pi in P, T) stores pred(pi), £i, and three 
other pieces of information: 

1 . a point parent (pi) called the parent, 

2 . a list of points nbr(pi) called the neighbors, and 

3. a list of points ch(pi) called the children of pi. 

These three objects change over the course of the algorithm. We only require 
that for all i G [n] and all pj G Pi, they satisfy the following invariants after i 
points have been processed. 

1. Parent Invariant: parent(p^) = pj if £j > G. Otherwise, parent(pj) is a 
point pk such that £k > £i and A{pj,pk) < 2 ^G 

2. Child Invariant: ch{pj) O {p^} U {pk G Pi \ parent(pfc) = pj and £k = 

£i}. 

3. Neighbor Invariant: nbr(pj) O {p^, G Pi \ d{pj,pk) < 

The second invariant states that the children list of pj contains all points at 
the same level as Pi that have pj as a parent. The third invariant says that the 
neighbor lists contain all nearby points where “nearby” is related to the insertion 
radius of pi. This last invariant implies the correctness of the algorithm, because 
for j = i, it says the neighbor list contains the set we are interested in. We 
maintain the lists for the other points to help us do updates at each step. 

Furthermore, we assume that V provides constant-time access to the list of 
points in a specific level. 

Algorithml^shows how a new point pi can be inserted into the data structure 
T). In fact, we process points of a greedy permutation one by one and after 
inserting a new point in T), we update the directed graph G, which is used to 
extract higher dimensional simplices. 
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Algorithm 1 Inserting a new point into the data structure V 
1: procedure Insert(T), pi) 

2: if £i < £i-i then 

3: for all pk such that £k = £i-i do 

4: parent (pfc) ^ pk 

5: pj ^ pred(pj) 

6: parent (pi) ^ parent (p^ ) 

7: for all pfc G nbr(parent(pj )) do 

8: if d(p,,pk) < d(pi, parent(pi)) and 4 > £i then 

9: parent (pi) ^ Pfc 

10: add Pi to ch(pi) 

11: add Pi to ch(parent(pi)) 

12: add Pi to nbr(pi) 

13: for all pfc e ch(nbr(parent(pi))) do 

14: if d(pi,pk) < k2^' then 

15: add Pk to nbr(pi) 

16: add Pi to nbr(pfc) 


Lemma 8. Let P = (pi,... ,pn) be a greedy permutation. For alii G {2,, n}, 
if V is a data strueture on Pi-i satisfying the three invariants, then it also 
satisfies the invariants after calling Insert(I?, pi). 

Proof. We consider the invariants one at a time. 

First, if £i < £i-i, the algorithm updates the parents of all nodes in level 
£i-i. Note that these are the only points required to be updated to satisfy the 
Parent Invariant for all points in Pi-i. 

Next, we check that there exists a point pk such that setting parent(pi) to 
Pk satisfies the Parent Invariant. The algorithm iterates over nbr(parent(pj)) 
to find the closest point with a level higher than £i. We first show there exists 
a point in a higher level that satisfies the Parent Invariant and then show that 
any such point is in nbr(parent(pj)). Let z = argmax^^j{4 | £z > £i}- Let pk 
be the closest point in P^ to Pi. So, 

d(pj,Pfc) = d{p„P^) < maxd(p,P^) = A^+i < < 2^\ 

peP 

Thus, some point pk could satisfy the Parent Invariant. Any such point pk 
satisfies 


d(pfc,parent(pj)) < d{pk,Pi) + d{pi,pj) + d(pj,parent(p^)) 

< 2^^ + A, + 2^- 

< 2 ^* + 2 ^’ + 2 ^‘ 

= ^ 

2 

< k2^^+\ 
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Therefore, pk G nbr(parent(p^)) by the Neighbor Invariant. 

For the Child Invariant, pi needs to be inserted into ch(parent(pi)). No other 
children lists need to change to satisfy the invariant. 

Next, to satisfy the Neighbor Invariant, neighbor lists should be updated. 
This only involves finding the neighbor list of pt and also adding pi to the 
neighbor lists of its neighbors. For this step, it suffices to check that if pk must 
be added to nbr(pi), i.e. if d{pi,pk) < then pk S ch(nbr(parent(pi))). 

That is, the neighbors of pi are all children of neighbors of the parent of pi. 
This follows from the triangle inequality and the invariants for i — 1 as follows. 

d(parent(pfe),parent(pi)) < d(parent(pfc),pfc) + d(pfe,pi) + d(pj,parent(p*)) 

< + 2 ^‘ 

= (I + k/2)2^‘+^ 

< k2^'+\ 

So, it follows that parent(pfc) € nbr(parent(pi)), and so pk G ch(parent(pfc)) C 
ch(nbr(parent(pi))). If pk is added to nbr(pi), then it is required to add pi to 
nbr(pfc) and the algorithm does this. □ 

Algorithm|^constructs all edges that appear in a sparse filtration. It receives 
a set of points P, which is ordered by a greedy permutation, as input and returns 
a directed graph G. As we mentioned earlier, we will use the directed graph G to 
find higher dimensional simplices. For each point pi, the algorithm invokes the 
Insert procedure to find its neighbors. Then, to build sparse edges between 
Pi and its neighbors. Algorithm is called. If an edge appears in the sparse 
filtration, EdgeBirthTime method returns the birth time of the edge and oo 
otherwise. Finally, for an edge in the sparse hltration, a directed edge from pi 
to pj will be inserted into G. 


Algorithm 2 Constructing edges of a sparse filtration 
1: procedure ConstructEdges(P = {pi,... ,Pn}) 

2 : initialize V with pi > adds pi to ch(pi) and sets parent(pi) = pi. 

3: initialize a directed graph G on P 

4: for i = 2 to n do 

5: lNSERT(P,Pi) 

6: for all Pj G nbr(pi) do 

7: a ^EdgeBirthTime(pj,p^) 

8: if a < OO then 

9: add a directed edge from pi to pj with birth time a to G 

return G 


Theorem 9. Given a greedy permutation of a finite metric (P, d) of constant 
doubling dimensions and the nearest predecessors pred(p) for each p G P, one 
can compute the edges of the sparse nerve filtration of (P, d) in 0{n) time. 
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Algorithm 3 Compute the birth time of an edge 
1: procedure EDGEBiRTHTiME(pi,pj) 

2: if Xi > Xj then 

3: swap Pi and pj 

4: if d{pi,pj) < tjjen 

5: return 

6: if d{pi,pj) < then 

7: return d{pi,pj) - 

8: return oo 


Proof. Algorithm finds all edges in a sparse filtration. The running time of 
this algorithm mainly depends on the running time of Insert procedure and 
the size of neighbor list for each point. 

In Algorithmic the most common operation for the lists nbr(pj) and ch{pj) 
is to enumerate their elements. Any time a list is enumerated, we can check 
each point in constant time to see if it is still required to satisfy the invariant 
and remove it otherwise. Note that although the invariants only specify a subset 
that must appear, it is easy to check that enumerating these lists can be done in 
amortized constant time. This follows from two facts. First, the required subsets 
have constant size (by standard packing arguments). Second, the number of 
removals is at most the number of insertions, so we charge the cost of visiting 
such a point in the enumeration to the cost of its insertion. 

In addition, when inserting pi, if ii < £i_i, then parent{pk) is updated for 
all pk such that £k = f'i-i- The total cost of such operations is 0{n) as no 
parent is updated twice. 

After insertion of a point pi into V, Algorithm is called for all points in 
nbr(pi) to check whether an edge belongs to the sparse filtration. This algorithm 
has a constant running time. In addition, by Lemma the size of a neighbor 
list for each point is constant. Therefore, for each point, the cost of finding 
these edges in the sparse filtration in 0(1). □ 

5.3 Higher Dimensional Simplices 

In the previous section, it is shown that from a greedy permutation, the edges 
of a sparse nerve filtration can be constructed in linear time. Now, we present 
an algorithm to find fc-simplices in the sparse filtration for fc > 1. As mentioned 
earlier, the directed graph G built from the edges of the sparse nerve filtration 
will be used to construct higher dimensional simplices. 

Let E{v) be the vertices adjacent to a vertex u in G (for each u G E{v), there 
is a directed edge from v to u). To find a fc-simplex for fc > 1 containing a vertex 
V, we consider all subsets {ui,..., u^} of fc vertices in E{v). If {u, Ui,..., Uk} 
forms a (fc + l)-clique, we check the clique to see whether it creates a fc-simplex 
and compute its birth time. The birth time of a fc-simplex cr in a nerve filtration 
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is defined as follows. 


SimplexBirthTime((t) := min < a : [^ 0 / = min < a : n ^^■(«) 

[ i6cr J [ jda 

If no such a exists, then we define the birth time to be oo. We assume the 
user provides a method, SimplexBirthTime, to compute birth times for their 
metric that runs in time polynomial in k. This function takes a (fc + l)-clique as 
input. If at some scale a, the corresponding balls have a common intersection, 
it returns the minimum such a, otherwise, it returns oo indicating the (fc + 1 )- 
clique is not a fc-simplex in the sparse filtration. 

For the case of Rips filtrations (i.e. ^oo); SimplexBirthTime((t) just needs 
to compute the maximum birth time of the edges and compare it to minp^gg. Ai(l+ 
e)^/e (the first time t after which some Pi G a has bi{t) = 0). For £ 2 , the corre¬ 
sponding computation is a variation of the minimum enclosing ball problem. 

Algorithm 1^ finds the fc-simplices and birth times in a sparse filtration. In 
this algorithm, G is the given directed graph and the output S is the set of pairs 
(cr, t), where u is a fc-simplex and t is its birth time. 



Algorithm 4 Find all fc-simplices and birth times 
1 : procedure FindSimplices(G', k) 

2 : 5 '^ 0 

3: for all vertex u in G do 

4: for all {mi, ..., life} C E{v) do 

5: if {u, wi,..., Uk} is a (fc -I- l)-clique then 

6 : a ^ {v,Ui,...,Uk} 

7: t SlMPLEXBlRTHTlME(tT) 

8 : if t < OO then 

9: S' ^ 5 U (ct, t) 

10: return S 


Theorem 10. Given the edges of a sparse nerve filtration, Algorithm^ finds 
the k-simplices o/{S“} in time, where p is the doubling dimension of 

the input metric, k = (e^ -I- 3e -I- 2)/e, and e > 0. 

Proof. In Algorithm for every vertex v in the directed graph G, there are 
subsets with size k. In addition, by Lemma a ■ There¬ 
fore, the total running time of this algorithm will be □ 

6 Removing Vertices 

Because the sparse filtration is a true filtration, no vertices are removed. When 
the cone is truncated, no new simplices will be added using that vertex, but it is 
still technically part of the filtration. The linear-size guarantee is a bound on the 
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total number of simplices in the complex. Thus, by using methods such as zig¬ 
zag persistence or simplicial map persistence to fully remove these vertices when 
they are no longer needed cannot improve the asymptotic performance. Still, 
it may be practical to remove them (see [2]). A full theoretical or experimental 
analysis of the cost tradeoff of using a heavier algorithm to do vertex removal 
is beyond the scope of this paper. 

In this section, we show that the geometric construction leads to a natural 
choice of elementary simplicial maps (edge collapses) which all satisfy the so- 
called link condition. In the persistence by simplicial maps work of Dey et 
al. m and Boissonat et al. [T], a key step in updating the data structures to 
contract an edge is to first add simplices so that the so-called Link Condition is 
satished. The link of a simplex ct in a complex K is defined as 

Lk (7 = {r\(T|rGAr and cr C r}. 

That is, the link a is formed by removing the vertices of a from each of its 
cofaces. An edge {u,v} € K satisfies the Link Condition if and only if 

Lk {m, u} = Lk {m} n Lk {u}. 

Dey et al. m proved that edge contractions induce homotopy equivalences 
when the link condition is satisfied. Thus, it gives a minimal local condition 
to guarantee that the contraction preserves the topology. More recently, it was 
shown that such a contraction does not change the persistent homology m- 

Proposition 11. If {P, d) is a finite subset of a convex metric space and {5'“} is 
its corresponding sparse filtration, then the last vertex Pn has a neighbor pi such 
that the edge {pmPi} G S°‘ satisfies the link condition, where a = A„(l +e)‘^/e 
and Xn is the insertion radius ofpn- 

Proof. It follows directly from the definition of a link that Lk {u, u} C Lk {u} H 
Lk {u} for all edges {u,v}. By the Covering Lemma (Lemma[^, we know that 
there exists a. pi G P such that bn{a) C bi{a). Thus, it suffices to check that 
Lk {z} n Lk {n} C Lk {i, n}. Because the vertices are ordered according to a 
greedy permutation, > Xj for all pj G P. It follows that a simplex J S 5"“ if 
and only if ^ 

Let J be any simplex in Lk {z}nLk {n}. So, i,n ^ J and n^eJuIn} 

Because bn{oi) C bfia) = bn{a), it follows that Plj-GJuIin} ^ Thus, we 

have J S Lk {i,n} as desired. □ 

7 Conclusion 

In this paper, we gave a new geometric perspective on sparse filtrations for topo¬ 
logical data analysis that leads to a simple proof of correctness for all convex 
metrics. By considering a nerve construction one dimension higher, the proofs 
are primarily geometric and do not require explicit construction of simplicial 
maps. This geometric view clarifies the non-zig-zag construction, while also 
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showing that removing vertices can be accomplished with simple edge contrac¬ 
tions. 
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